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Abstract 

Stochastic volatility modelling of financial processes has become in- 
creasingly popular. The proposed models usually contain a stationary 
volatility process. We will motivate and review several nonparametric 
methods for estimation of the density of the volatility process. Both 
models based on discretely sampled continuous time processes and dis- 
crete time models will be discussed. 

The key insight for the analysis is a transformation of the volatility 
density estimation problem to a deconvolution model for which stan- 
dard methods exist. Three type of nonparametric density estimators 
are reviewed: the Fourier-type deconvolution kernel density estimator, 
a wavelet deconvolution density estimator and a penalized projection 
estimator. The performance of these estimators will be compared. 

Key words: stochastic volatility models, deconvolution, density estima- 
tion, kernel estimator, wavelets, minimum contrast estimation, mixing 
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1 Introduction 



We discuss a number of nonparametric methods that come into play when 
one wants to estimate the density of the volatility process, given observations 
of the price process of some asset. The models that we treat are mainly 
formulated in continuous time, although we pay some separate attention 
to discrete time models. The observations of the continuous time models 
will always be in discrete time however and may occur at low frequency 
(fixed lag between observation instants), or high frequency (vanishing time 
lag). In this review, for simplicity we focus on the univariate marginal 
distribution of the volatility process, although similar results can be obtained 
for multivariate marginal distributions. 

Although the underlying models differ in the sense that they are for- 
mulated either in continuous or in discrete time, in all cases the observations 
are given by a discrete time process. Moreover, as we shall see, the obser- 
vation scheme can always (approximately) be cast as of 'signal plus noise' 
type 

Yi = Xi + e i , 

where Xi is to be interpreted as the 'signal'. If for fixed i the random vari- 
ables Xi and Si are independent, the distribution of the Y% is a convolution 
of the distributions of Xi and £j. The density of the 'signal' Xi is the object 
of interest, while the density of the 'noise' e% is supposed to be known to the 
observer. The statistical problem is to recover the density of the signal by 
deconvolution. Classically for such models it was often also assumed that 
the processes (Xi) and (s^ are i.i.d. Under these conditions Fan [12] gave 
lower bounds for the estimation of the unknown density / at a fixed point 
xq and showed that kernel-type estimators achieve the optimal rate. An 
alternative estimation method was proposed in the paper Pensky and Vi- 
dakovic [23] , using wavelet methods instead of kernel estimators and where 
global L 2 -errors were considered instead of pointwise errors. 

However, for the stochastic volatility models that we consider, the i.i.d. 
assumption on the Xi is violated. Instead, the Xi may be modelled as sta- 
tionary random variables, that are allowed to exhibit some form of weak 
dependence, controlled by appropriate mixing properties, strongly mixing 
or /3-mixing. These mixing conditions are justified by the fact that they 
are satisfied for many popular GARCH-type and stochastic volatility mod- 
els (see e.g. Carrasco and Chen [B]), as well as for continuous time models, 
where a 2 is solves a stochastic differential equation, see e.g. Genon-Catalot 
et al. [17J. The estimators that we discuss are based on kernel methods, 
wavelets and penalized contrast estimation, also referred to as penalized 
projection estimation. We will review the performance of these deconvo- 
lution estimators under weaker than i.i.d. assumptions and show that this 
essentially depends on the smoothness and mixing conditions of the under- 
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lying process and the frequency of the observations. For a survey of other 
nonparametric statistical problems for financial data we refer to Franke et 
al. PH 

The paper is organized as follows. In Section [2] we introduce the con- 
tinuous time model. In Section [3] we consider a kernel type estimator of the 
invariant volatility density and apply it to a set of real data. Section [4] is 
devoted to a wavelet density estimator and in Section [5] a minimum contrast 
estimator is discussed. Some related results for discrete time models are 
reviewed in Section [6] and Section [7] contains some concluding remarks. 



2 The continuous time model 



Let S denote the log price process of some stock in a financial market. It 
is often assumed that S can be modelled as the solution of a stochastic 
differential equation or, more general, as an Ito diffusion process. So we 
assume that we can write 

dSt = b t dt + a t dWt, S = 0, (1) 

or, in integral form, 

S t = f b s ds+ I a s dW s , (2) 
Jo Jo 

where TV is a standard Brownian motion and the processes b and a are as- 
sumed to satisfy certain regularity conditions (see Karatzas and Shreve |22j ) 
to have the integrals in ^ well-defined. In a financial context, the process 
a is called the volatility process. One often takes the process a independent 
of the Brownian motion W. 

Adopting this common assumption throughout the paper, unless explic- 
itly stated otherwise, we also assume that a is a strictly stationary positive 
process satisfying a mixing condition, for example an ergodic diffusion on 
(0,oo). We will assume that the one-dimensional marginal distribution of 
a has an invariant density with respect to the Lebesgue measure on (0, oo). 
This is typically the case in virtually all stochastic volatility models that 
are proposed in the literature, where the evolution of a is modelled by a 
stochastic differential equation, mostly in terms of a 2 , or logo -2 (cf. e.g. 
Wiggins jH], Heston [20). Often a\ is a function of a process Xt satisfying 
a stochastic differential equation of the type 

dX t = b{X t )dt + a{X t )dB u (3) 

with Bt a Brownian motion. Under regularity conditions, the invariant 
density of X is up to a multiplicative constant equal to 

-•• 1 e Xp ( 2 f4^d,Y (4) 
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where xq is an arbitrary element of the state space, see e.g. Gihman and 
Skorohod [19 or Skorokhod [25] . From formula Q one sees that the invari- 
ant distribution of the volatility process (take X for instance equal to a 2 
or logu 2 ) may take on many different forms, as is the case for the various 
models that have been proposed in the literature. In absence of parametric 
assumptions on the coefficients a and 6, we will investigate nonparametric 
procedures to estimate the corresponding densities, even refraining from an 
underlying model like Q, partly aimed at recovering possible 'stylized facts' 
exhibited by the observations. 

For instance, one could think of volatility clustering. This may be cast 
by saying that for different time instants ti,t2 that are close, the corre- 
sponding values of at 1 ,a , t 3 are close again. This can partly be explained by 
assumed continuity of the process a, but it might also result from specific 
areas around the diagonal where the multivariate density of (cr^,^) as- 
sumes high values if ti and £2 are relatively close. It is therefore conceivable 
that the density of (o"t 1 ,at 2 ) has high concentrations around points (£,£) 
and (h,h), with £ < h, a kind of bimodality of the joint distribution, with 
the interpretation that clustering occurs around a low value £ or around a 
high value h. This in turn may be reflected by bimodality of the univariate 
marginal distribution of ctj. 

A situation in which this naturally occurs is the following. Consider a 
regime switching volatility process. Assume that for i = 0,1 we have two 
stationary processes X 1 having stationary densities f l . We assume these two 
processes to be independent, and also independent of a two-state stationary 
homogeneous Markov chain U with states 0,1. The stationary distribution 
of U is given by 7Tj := P(Ut = i). The process £ is defined by 

& = U t X l t + (1 - u t )x°. 

Then £ is stationary too and it has a stationary density / given by 

f(x)=7T 1 f\x)+7r f°(x). 

Suppose that the volatility process is defined by of = exp(£ t ) and that the 
X % are both Ornstein-Uhlenbeck processes given by 

dXl = -bi(Xi -m)dt + ai dWj, 

with W l , W 2 independent Brownian motions, )Ui 7^ H2 and 61,62 > 0. 

a 2 

Suppose that the X 1 start in their stationary iV(/Xj, ^) distributions. Then 
the stationary density / is a bimodal mixture of normal densities with 
and [12 as the locations of the local maxima. Nonparametric procedures are 
able to detect such a property and are consequently by all means sensible 
tools to get some first insights into the shape of the invariant density. 

A first object of study is the marginal univariate distribution of the 
stationary volatility process a. The standing assumption in all what follows 
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is that this distribution admits a density w.r.t. Lebesgue measure. We will 
also consider the invariant density of the integrated squared volatility pro- 
cess over an interval of length A. By stationarity of a this is the density 
of J of dt. We will consider density estimators and assess their quality by 
giving results on their mean squared or integrated mean squared error. For 
kernel estimators, we rely on Van Es et al. |10j . where this problem has been 
studied for the marginal univariate density of a. In Van Es and Spreij [9] one 
can find results for multivariate density estimators. Results on wavelet esti- 
mators will be taken from Van Zanten and Zareba [32]. Penalized contrast 
estimators have been treated in Comte and Genon-Catalot [7]. 

The observations of log-asset price S process are assumed to take place 
at the time instants 0, A, 2 A, . . . , nA. In case one deals with low frequency 
observations, A is fixed. For high frequency observations, the time gap 
satisfies A = A n — > as n — > oo. To obtain consistency for the estimators 
that we will study in the latter case, we will make the additional assumption 
nA n — > oo. 

To explain the origin of the estimators that we consider in this paper, 
we often work with the simplified model, which is obtained from ([I]) by 
taking bt = 0. We then suppose to have discrete-time data So, Sa, S2A, ■ ■ ■ 
from a continuous-time stochastic volatility model of the form 

dS t = a t dW t . 

Under this additional assumption, we will see that we (approximately) deal 
with stationary observations Yi that can be represented as Yi = Xi + Ej, 
where for each i the random variables Xi and Si are independent. 

3 Kernel deconvolution 

In this section we consider kernel deconvolution density estimators. We con- 
struct them, give expressions for bias and variance and give an application 
to real data. 

3.1 Construction of the estimator 

To motivate the construction of the estimator, we first consider ([I]) without 
the drift term, so we assume to have the simplified model 

dS t = a t dW t , 5 = 0. (5) 

It is assumed that we observe the process S at the discrete time instants 0, 
A, 2A, . . . , nA, satisfying A —* 0, nA — ► 00. For i = 1, 2, ... we work, as in 
Genon-Catalot et al. |15 | 116 ) . with the normalized increments 

X^ = -j^(SiA - <%-i)a)- 
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For small A, we have the rough approximation 



r-iA 

'A _ 1 



v ^ J(i-l)A 

~ 0-(i-l)A^(Wi A - W (i _i) A ) (6) 



where for i = 1, 2, ... we define 

f 



By the independence and stationarity of Brownian increments, the sequence 
Z^, Z2, ... is an i.i.d. sequence of standard normal random variables. More- 
over, the sequence is independent of the process a by assumption. 

Writing Yj = log(X^) 2 , & = logo|._ 1)A , £i = log(Z A ) 2 and taking the 
logarithm of the square of X A we get 
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where the terms in the sum are independent. Assuming that the approxima- 
tion is sufficiently accurate we can use this approximate convolution struc- 
ture to estimate the unknown density / of logc 2 A from the transformed 
observed Yi = log(X A ) 2 . The characteristic functions involved are denoted 
by 4>y, <p£ and 4>k, where k is the density of the 'noise' log(Z A ) 2 . One 
obviously has 4>y = <t>£<t>k an d one easily sees that the density k is given by 

1 1 _i, 
k(x) = —^= e2 x e 2 e , 
\Z2tt 

and its characteristic function by 



The idea of getting a deconvolution estimator of / is simple. Using 
a kernel function w, a bandwidth h, and the Yi, the density g of the Yi is 
estimated by 

1 S^...,V-Yj, 



9nh(y) = -7- E 



nh ' h ' 
j 

Denoting 4> g ^ n h the characteristic function of g n h, one estimates (fry by 4>g,nh 
and 0£ by 4> g . n h/4>k- Following a well-known approach in statistical decon- 
volution theory (see e.g. Section 6.2.4 of Wand and Jones [30 J, Fourier 
inversion then yields the density estimator of /. By elementary calculations 
one obtains from this procedure 



fnh(x) = iy>fc , (7) 
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where Vh is the kernel function, depending on the bandwidth h, 

Vh ( x ) = If -ML e -i- ds . ( 8 ) 

One easily verifies that the estimator f n h, is real- valued. 

To justify the approximation in (|6]), we quantify a stochastic continuity 
property of a 2 . In addition to this we make the mixing condition explicit. 
We impose 

Condition 3.1. The process a 2 satisfies the following conditions. 

1. It is L 1 -H61der continuous of order one half, E|of — a 2 \ = 0(i 1//2 ) for 
t -> 0. 

2. It is strongly mixing with coefficient a(t) satisfying, for some < q < 
1, 

/•oo 

/ a(t) q dt<oo. (9) 
Jo 

The kernel function it; is assumed to satisfy the following conditions (an 



example of such a kernel is given in (12) below, see also Wand [29]), which 



includes in particular the behavior of (f) w at the boundary of its domain. 

Condition 3.2. Let robea real symmetric function with real valued sym- 
metric characteristic function cf) w with support [-1,1]. Assume further 

J-oo \ w ( u )\ du < 00 > f-oc w{u)du = 1 , u 2 \w(u)\du < oo , 

2. 0^(1 -t) = AtP + o(tP), as t | for some p > 0, A G R. 



The first part of Condition 3.1 is motivated by the situation where 



X = a 2 solves a SDE like ([T]). It is easily verified that for such processes 
it holds that E\a 2 — a 2 \ = 0(t 1 / 2 ), provided that b G L\(p) and a G L 2 (p), 
where p is the invariant probability measure. Indeed we have E|cr| — Oq| < 

The main result we present for this estimator concerns its mean squared 
error at a fixed point x. Although the motivation of the estimator was based 
on the simplified model ([5]), the result below applies to the original model ([!]). 
For its proof and additional technical details, see Van Es et al. |10| . 



Theorem 3.3. Assume that Kb 2 is bounded. Let the process a satisfy Con- 



the density f of log a 2 be twice continuously differentiable with a bounded 
second derivative. Also assume that the density of a 2 is bounded in a neigh- 



dition 3.1, and let the kernel function w satisfy Condition 3.2. Moreover, let 

assume that the density of o t 
bourhood of zero. Suppose that A = n _<5 for given < 5 < 1 and choose 
h = 77r/logn, where 7 > 4/5. Then the bias of the estimator ([?]) satisfies 

Ef nh {x) - f{x) = \h 2 f"{x) J u 2 w{u)du + o(h 2 ), (10) 
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whereas, the variance of the estimator satisfies the order bounds 



Var/w,W = o(W*) +0 (^). (11) 

Remark 3.4. The choices A = n~ s , with < S < 1 and h = jn/logn, with 
7 > 4/(5 render a variance that is of order n~ 1+1 / 7 (l/logn) 2p for the first 



term of (11 ) and n~ 1+5 (log n) 1+q for the second term. Since by assumption 
7 > 4/(5 we have 1/7<(5/4<(5so the second term dominates the first term. 
The order of the variance is thus n~ 1+5 (log n) 1+q . Of course, the order of 
the bias is logarithmic, hence the bias dominates the variance and the mean 
squared error of f n h(x) is of order (logn) -4 . 



Remark 3.5. It can then be shown that for the characteristic function (fik 
one has the behavior 

= >/2e-**W(l + O(^)), \s\ - oo. 

This means that k is supersmooth in the terminology of Fan [12] which 
explains the slow logarithmic rate at which the bias vanishes. Sharper results 
on the variance can be obtained when a 2 is strongly mixing, see Van Es et 
al. for further details. The orders of the bias and of the MSE remain 
unchanged though. 



3.2 An application to the Amsterdam AEX index 

In this section we present an example using real data of the Amsterdam AEX 
stock exchange. We have estimated the volatility density from 2600 daily 
closing values of the Amsterdam stock exchange index AEX from 12 / 03/1990 
until 14/03/2000. These data are represented in Figure [TJ We have cen- 
tered the daily log returns, i.e we have subtracted the mean (which equaled 
0.000636), see Figure [2] The deconvolution estimator is given as the left 
hand picture in Figure [3] Observe that the estimator strongly indicates 
that the underlying density is unimodal. Based on computations of the 
mean and variance of the estimate, with h = 0.7, we have also fitted a nor- 
mal density by hand and compared it to the kernel deconvolution estimator. 
The result is given as the right hand picture in Figure [3] The resemblance 
is remarkable. 

The kernel used to compute the estimates is a kernel from Wand [29], 
with p = 3 and ^4 = 8, 

, . 48x(x 2 - 15) cos j; - 144(2x 2 - 5)sinx ,„„. 
w(x) = f . (12) 

TTX 1 

It has characteristic function 

w (t) = (l-t 2 ) 3 , |t|<l. (13) 
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Figure 1: AEX. Left: daily closing values. Right: log of the daily closing 
values. 
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-25 •. -25 



Figure 2: AEX. Left: the values of Xt, i.e. the centered daily log returns. 
Right: log(Xf) . 

The bandwidths are chosen by hand. The estimates have been computed 
by fast Fourier transforms using the Mathematica 4.2 package. 

This is actually the same example as in our paper Van Es et al. [11] 
on volatility density estimation for discrete time models. The estimator Q 
presented here is, as a function of the sampled data, exactly the same as 
the one for the discrete time models. The difference lies in the choice of 
underlying model. In the present paper the model is a discretely sampled 
continuous time process, while in Van Es et al. [11J it is a discrete time 
process. For the latter type of models the discretization step in the beginning 
of this section is not necessary since these models satisfy an exact convolution 
structure. 

4 Wavelet deconvolution 

As an alternative to kernel methods, in this section we consider estima- 
tors based on wavelets. Starting point is again the simplified model Q. 
Contrary to the previous section, we are now interested in estimating the 
accumulated squared volatility over an interval of length A. We assume 
having observations of S at times iA to our disposal, but now with A fixed 
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Figure 3: AEX. Left: The estimate of the density of log(<7 2 ) with h = 0.7. 
Right: The normal fit to the log (erf). The dashed line is the normal density 
and the solid line the kernel estimate. 

(low frequency observations). Let, as before, = A~ 1 ^ 2 (SiA — <S7i_i)A) 
and let af = A -1 JA'^a °t ^t- Denote by T a the cr-algebra generated by 
the process a. By the assumed independence of the processes a and W, we 
have for the characteristic function of Xf^ given T a 

E[exp(i s Xf)|^ a ] = exp(-i^ s 2 ). 

Consider also the model Xf 1 = &iZi, with &i and Z% independent for each % 
and Zi a standard Gaussian random variable. Then 

E[exp(i s Xf)|JF ai ] = exp(-i^ s 2 ). 

It follows that and X? are identically distributed. From this observation 
we conclude that the transformed increments log (A _1 (S'jA — »%-i)a) 2 ) are 
then distributed as Y, L = £j + £j, where 

Ci = loga 2 , £i = logZ 2 , 

and Zi is an i.i.d. sequence of standard Gaussian random variables, indepen- 
dent of a. The sequence is stationary and we assume that its marginal 
density g exists, i.e. g is the density of log (A -1 J" Q A a 2 duj . The density 
of the £j is again denoted by k. Of course, estimating g is equivalent to 
estimating the density of the aggregated squared volatility J" A <r 2 du. 

In the present section the main focus is on the quality of the estimator 
in terms of the mean integrated squared error, as opposed to establishing 
results for the (pointwise) mean squared error as in Section [3| At the end 
of this section we compare the results presented here to those of Section [3} 

First we recall the construction of the wavelet estimator proposed in 
Pensky and Vidakovic [23] . For the necessary background on wavelet theory, 
see for instance Blatter p], Jawerth and Sweldens (21], and the references 
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therein. For the construction of deconvolution estimators we need to use 
band-limited wavelets. As in Pensky and Vidakovic |23j we use a Meyer- 
type wavelet (see also Walter [27], Walter and Zayed [28 ). We consider an 
orthogonal scaling function and wavelet ip and ip, respectively, associated 
with an orthogonal multiresolution analysis of L 2 (R). We denote in this 
section the Fourier transform of a function / by /, i.e. 

/»= / e~^f(x)dx, 
Jr 

and suppose that for a symmetric probability measure \x with support con- 
tained in [— 7r/3,7r/3] it holds that 

/ \l/2 ~ • /0 / \i/2 

<p(u) = {n(u-Tr,u) + ir)) , ip(u) = e luJ/ z ({jl(\uj\/2 - n, \u\ - ir] J . 

Observe that the assumptions imply that ip and ip are indeed band-limited. 
For the supports of their Fourier transforms we have supp tp C [— 47r/3, 47r/3] 
and supp-0 C [— 87r/3, — 2-7r/3] U [27r/3,87r/3]. By choosing fi smooth enough 
we ensure that (p and ip are at least twice continuously differentiable. 
For any integer m, the unknown density g can now be written as 

oo 

9( x ) = ^2a mt iip mi i(x) + ^2^2 ( 14 ) 
zez zez j=m 

where <p m ,i{ x ) = 2 m / 2 (p(2 m x — I), ipji(x) = 2- ? / 2 -i/?(2 J x — I) and the coefficients 
are given by 



a m ,i = / <p m ,i( x )9(x)dx , bjj = / ip jt i(x)g{x) Ax. 

JR JR 

The idea behind the linear wavelet estimator is simple. We first approximate 
g by the orthogonal projection given by the first term on the right-hand side 



of (14). For m large enough the second term will be small, and can be 
controlled by using the approximation properties of the specific family of 
wavelets that is being used. The projection of g is estimated by replacing 
the coefficients a m \ by consistent estimators and truncating the sum. Using 
the fact that the density p of an observation Yi is the convolution of g and 
k it is easily verified that 

a m ,i= [ 2 m / 2 U m (2 m x-l)p(x)Ax = 2 m / 2 EU m (2 m Y i -l), 
Jr 

where U m is the function with Fourier transform 

*-<"> = W (15) 
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We estimate the coefficient a m \ by its empirical counterpart 



n 

-V2 m / 2 [/ m (2™y i -l). 

n ± — * 



n . 

Under the mixing assumptions that we will impose on the sequence Y, it 
will be stationary and ergodic. Hence, by the ergodic theorem, a m ^ n is a 
consistent estimator for a m ^\. The wavelet estimator is now defined by 

9n(x) = & m n ,l,n'Pm n ,l(x), (16) 

\l\<L n 

where the detail level m n and the truncation point L n will be chosen appro- 
priately later. 

The main results in the present section are upper bounds for the mean 
integrated square error of the wavelet estimator g n , which is defined as usual 
by 

MISE (g n ) = E [ (g n (x) - g{x)f dx. 



We will specify how to choose the detail level m n and the truncation point 
L n in (16) optimally in different cases, depending on the smoothness of g 
and k. The smoothness properties of g are described in terms of g belonging 
to certain Sobolev balls and by imposing a weak condition on its decay rate. 
The Sobolev space H a is defined for a > by 

H a = \g: || 5 || Q = ( / |5H| 2 (^ 2 + l) a d^ 1/2 <oo). (17) 



Roughly speaking, g E H a means that the first a derivatives of g belong to 
L 2 (M). The Sobolev ball of radius A is defined by 

^a(A) ={geH a : \\g\\ a < A} . 

The additional assumption on the decay rate is reflected by g belonging to 

y*(A,A') = y a (A) n {g : sup|x 5 (x)| < A'}. 

We now have the following result, see Van Zanten and Zareba [32], for the 
wavelet density estimator g n of g defined by (16). 

Theorem 4.1. Suppose that the volatility process a 2 is strongly mixing with 
mixing coefficients satisfying 

E«£a <0 ° ( 18 ) 

fc>0 
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for some p E (0, 1). Then with the choices 



log n 

L n = log n) , r > 1 + 2a 



" 1 + (4vr 2 /3) ' 
the mean square error of the wavelet estimator satisfies 

sup MISE(5„) = o((logn)- 2Q 



for a, A, A' > 0. // (18) is satisfied for all p S (0, 1), the same bound is true 



if the choice for L n is replaced by L n = n. 

Let us point out the relation with the results of Section [3] and with 



those in Van Es et al. [TT], see also Section 6.1 In that paper kernel-type 
deconvolution estimators for discrete time stochastic volatility models were 
considered. When applied to the present model the results say that under the 
same mixing condition and assuming that g has two bounded and continuous 
derivatives, the (pointwise) mean squared error of the kernel estimator is of 
order (logra) -4 . The analogue of g having two bounded derivatives in our 
setting is that g G S^^A^A 1 ) for some A, A' > 0. Indeed, the theorem 
yields the same bound (logn) -4 for the MISE in this case. The same bound 
is valid for the MSE when estimating the marginal density for continuous 



time models, see Theorem 3.3 and its consequences in Remark |3.4| Theorem 
4.1 is more general, because the smoothness level is not fixed at a = 2, but 
allows for different smoothness levels of order a / 2 as well. Moreover, the 
wavelet estimator is adaptive in the sense that it does not depend on the 
unknown smoothness level, if the condition on the mixing coefficients holds 
for all p G (0, 1). 



5 Penalized projection estimators 

The results of the preceding sections assume that the true (integrated) 
volatility density has a finite degree of regularity, either in Holder or in 
Sobolev sense. Under this assumption the nonparametric estimators have 



logarithmic convergence rates, cf. Remark |3.4| and Theorem 4.1 Although 
admittedly slow, the minimax results of Fan [12] show that these rates are in 
fact optimal in this setting. In the paper Pensky and Vidakovic [23] it was 
shown however that if in a deconvolution setting the density of the unob- 
served variables has the same degree of smoothness as the noise density, the 
rates can be significantly improved, cf. also the lower bounds obtained in Bu- 
tucea 0] and Butucea and Tsybakov [5] . This observation forms the starting 
point of the paper Comte and Genon-Catalot J?], in which a nonparamet- 
ric volatility density estimator is developed that achieves better rates than 
logarithmic if the true density is super smooth. 
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In the latter paper it is assumed that there are observations 
<Sa , S2 a , • ■ • , S n a of a process S satisfying the simple equation with 
V = a 2 a (0, oo)-valued process independent of the Brownian motion W. 
It is assumed that we deal with high frequency observations, A — ► and 
nA — ► 00. We impose the following condition on V. 

Condition 5.1. The process V is a time-homogenous, continuous Markov 
process, strictly stationary and ergodic. It is either /3-mixing with coefficient 
j3{t) satisfying 

roc 

/ 0(t)dt<oo, 
Jo 

or is p-mixing. Moreover, it satisfies the Lipschitz condition 
E ( log (i/ Ay ' dt ) - lo g^o) 2 <CA, 

for some C > 0. 

In addition to this a technical assumption is necessary on the density / of 
log Vq we are interested in and on the density g a of log ^ ^ J Q A Vt dt^j , which 
is assumed to exist. Contrary to the notation of the previous section, we 
write gA instead of g, since now A is not fixed. 

Condition 5.2. The invariant density / is bounded and has a second mo- 
ment and g a G L 2 (R). 

As a first step in the construction of the final estimator a preli mina ry esti- 



mator fi is constructed for Let} fixed. Note that Condition 5.2 implies 
that / e L 2 (R), hence we can consider its orthogonal projection /x on 
the subspace Sl of L 2 (M), defined as the space of functions whose Fourier 
transform is supported on the compact interval [— nL, ttL]. An orthonor- 
mal basis for the latter space is formed by the Shannon basis functions 
4>L,j(%) = VLip(Lx — j), j G Z, with tj)(x) = sin(7rx)/(7nr) the sine kernel. 
For integers K n — > 00 to be specified below, the space Sl is approximated 
by the finite-dimensional spaces = spanj^Lj : \j\ < K n }. The function 
Jl is estimated by fa = argmin^g^n j n (h), where the contrast function 7 n 
is defined for h G L 2 (R) n L 1 (IR) by 

luih) = \\h\\ 2 - -^n,(log(^ A ) 2 ), u h (x) = — / e^^tTd.. 



i=i 

-2 



Here, as before, 4>k is the characteristic function of loge , with e standard 
normal and h is the Fourier transform of h. It is easily seen that 

1 n 

h = Yl h L,j^L,j, a Ld = -^2u^ L .(log(X^) 2 ). 

\j\<K n 3=1 
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Straightforward computations show that, with (•, •) the L 2 (M) inner product, 
Euh(log(X^) 2 ) = (h,g&), and hence Ej n (h) = \\h - #a||| - ||<?a|||- So in 
fact, /l is an estimator of the element of 5£ which is closest to $a- Since 
Sj? approximates Sl for large n and #a is close to / for small A, the latter 
element should be close to /l. 

Under Conditions |5.1| and |5.2| a bound for the mean integrated square 
error, or quadratic risk MISE(/i) = E||/^ — f\\% can be derived, depending 
on the approximation error [|/ — /l\\2, the bandwidth L and the truncation 
point K n , see Comte and Genon-Catalot [7j, Theorem 1. The result implies 



that if / belongs to the Sobolev space H a as defined in (17), then the choices 
K n = n and L = L n ~ logn yield a MISE of the order (logn) _2a , provided 
that A = A n = for some 5 £ (0, 1). Not surprisingly, this is completely 



analogous to the result obtained in Theorem 4.1 for the wavelet-based es- 
timator in the fixed A setting. In particular the procedure is adaptive, in 
that the estimator does not depend on the unknown regularity parameter 
a. 

To obtain faster than logarithmic rates and adaptation in the case that 
/ is supersmooth, a data-driven choice of the bandwidth L is proposed. 
Define 



L= argmin ( j n (f L ) + pen n (L) 

Le{l,...,logn} 



;i + L)$ fc (L) 



where the penalty term is given by 

pen n (L) = / 

n 

for k > a calibration constant and 

r L i 

$k(L) = / ds. 

For the quadratic risk of the estimator f^, the following result holds (Comte 
and Genon-Catalot [7]). 

Theorem 5.3. Under Conditions \5.2\ and \5.1\ we have 

WSE(f t )< Cl mf ri|/-/xl|i+ (1 + L)$fc(L) 
Le{i,...,iogn} V n 

_ log 2 n „ log n _ . . o 
+ <^2^^ + + C 4 A log 3 n, 

K n nA 

/or constants C\, C*2, C3, C4 > 0. 

It can be seen that this bound is worse than the corresponding bound 
for the estimator fi by a factor of the order L. This is at worst a logarithmic 
factor which, as usual in this kind of setting, has to be paid for achieving 
adaptation. The examples in Section 6 of Comte and Genon-Catalot |7 
show that indeed, the estimator ft can achieve algebraic convergence rates 
in case the true density / is supersmooth. 
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6 Estimation for discrete time models 



Although the main focus of the present paper is on estimation procedures 
for continuous time models, in the present section we also highlight some 
analogous results for discrete time models. These deal with both density 
and regression function estimation. 

6.1 Discrete time models 

The discrete analogue of ([5| is 

X t = a t Z t ,t=l,2,.... (19) 

Here we denote by X the detrended or demeaned log-return process. 
Stochastic volatility models are often described in this form. The sequence 
Z is typically an i.i.d. noise (e.g. Gaussian) and at each time t the random 
variables at and Z t are independent. See the survey papers by Ghysels et 
al. (18] or Shephard [21]. Also in this section we assume that the process a 
is strictly stationary and that the marginal distribution of a has a density 
with respect to the Lebesgue measure on (0,oo). We present some results 
for a nonparametric estimator of the density of log a 2 , as well as results for 
a nonparametric estimator of a nonlinear regression function, in case a 2 is 
given by a nonlinear autoregression. The standing assumption in all what 
follows is that for each t the random variables at and Zt are independent, the 
noise sequence is standard Gaussian and a is a strictly stationary, positive 
process satisfying a certain mixing condition. 

In principle one can distinguish two classes of models. The way in 
which the bivariate process (a, Z), in particular its dependence structure, is 
further modelled offers different possibilities. In the first class of models one 
assumes that the process a is predictable with respect to the filtration Tt 
generated by the process Z, and obtains that at is independent of Zt for each 
fixed time t. We furthermore have that (assuming that the unconditional 
variances are finite) a 2 is equal to the conditional variance of Xt given 
J-t-x- This class of models has become quite popular in the econometrics 
literature. It is well known that this class also contains the (parametric) 
family of GARCH-models, introduced by Bollerslev [2]. 

In the second class of models one assumes that the whole process a is 
independent of the noise process Z, and one commonly refers to the resulting 
model as a stochastic volatility model. In this case, the natural underlying 
filtration T = {^}t>o is generated by the two processes Z and a in the 
following way. For each t the cr-algebra Tt is generated by Z s , s < t and 
a s , s < t + 1. This choice of the filtration enforces a to be predictable. As 
in the first model the process X becomes a martingale difference sequence 
and we have again (assuming that the unconditional variances are finite) 
that a 2 is the conditional variance of Xt given J~t-i- An example of such a 
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model is given in De Vries [SB], where o is generated as an AR(1) process 
with a-stable noise (a € (0, 1)). 

As in the previous sections we refrain from parametric modelling and 
review some completely nonparametric approaches. We will mainly focus 
on results for the second class, as it is the discrete time analogue of the 
stochastic volatility models of the previous sections. At the heart of all 



what follows is again the convolution structure that is obtained from (19) 
by squaring and taking logarithms, 

logX 2 = logo 2 + logZ 2 . 
6.2 Density estimation 

The main result of this section gives a bias expansion and a variance bound 
of a kernel density type estimator of the density / of log of , which chosen 
to be, analogously to ([7]), 

, . 1 A /x-log(X 7 ) 2 \ . , 

3=1 V 7 

where Vh is the kernel function of Q. 

The next theorem is derived from Van Es et al. [11], where a multi- 
variate density estimator is considered. It establishes the expansion of the 
bias and an order bound on the variance of our estimator under a strong 
mixing condition. Under broad conditions this mixing condition is satisfied 
if the process o Markov, since then convergence of the mixing coefficients 
to zero takes place at an exponential rate, see Theorems 4.2 and Theo- 
rem 4.3 of Bradley [3] for precise statements. Similar behaviour occurs 
for ARMA processes with absolutely continuous distributions of the noise 
terms (Bradley [3], Example 6.1). 

Theorem 6.1. Assume that the process o is strongly mixing with coefficient 
ak satisfying 



P ^ 
or- < oo, 



for some (5 G (0, 1). Let the kernel function w satisfy Condition 3.2 and let 
the density f of log a 2 be bounded and twice continuously differentiate with 
bounded second order partial derivatives. Assume furthermore that o and Z 
are independent processes. Then we have for the estimator of f defined as 



in (20) and h — ► 

Ef nh {x) = f{x) + \h 2 f"{x) f u 2 w(u) du + o(h 2 ) (21) 

and 

Var/ n ,(x) = 0(i/ i 2 "e^). (22) 



17 



Remark 6.2. Comparing the above results to the ones in Theorem 3.3 



we observe that in the continuous time case, the variance has an additional 
°(^PT5a) term - 

6.3 Regression function estimation 



In this section we assume the basic model (19), but in addition we assume 
that the process a satisfies a nonlinear autoregression and we consider non- 
parametric estimation of the regression function as proposed in Franke et 
al. [13]. In that paper a discrete time model was proposed as a discretiza- 
tion of the continuous time model given by Q. In fact, Franke et al. include 
a mean parameter fi, but since they assume it to be known, without loss 

h. 

■2 



of generality we can still assume (19). Assume that the volatility process 



is strictly positive and consider log 07. It is assumed that its evolution is 
governed by 

lo g °f+ 1 = m( l °g °t) + m, ( 23 ) 

where the r\t are i.i.d. Gaussian random variables with zero mean. The 
regression function m is assumed to satisfy the stability condition 

,m(x) , 

limsup|^^| < 1. (24) 

\x\— >oo 

Under this condition the process a is exponentially ergodic and strongly 
mixing, see Doukhan [8 and these properties carry over to the process X 
as well. Moreover, the process log of admits an invariant density /. 
Denoting Yt = log , we have 

Y t = loga t 2 + logZ t 2 . 

It is common to assume that the processes Z and r] are independent, the 
second class of models described in Section |6.1[ but dependence between r\t 
and Zt for fixed t can be allowed for (first model class) without changing in 
what follows, see Franke et al. |13j . 



The purpose of the present section is to estimate the function m in (23). 



To that end we use the estimator f n h as defined in (20). Since this estimator 
resembles an ordinary kernel density estimator, the important difference 
being that the kernel function Vh now depends on the bandwidth h, the idea 
is to mimic the classical Nadaraya- Watson regression estimator similarly, in 
order to obtain an estimator of m(x). Doing so, one obtains the estimator 

m nh {x) = " fe ^ =i ; fe * )Yi+ \ (25) 



It follows that 



fnh{x) 
1 \ I \ Pnh\ x ) 

m nh (x) - m[x) = , 
Jnh{x) 
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where 

1 J—^ x — Y- 
Pnh{x) = -r22v h ( 3 )(Yj + i-m(x)). 

3=1 

In Franke et al. [13] bias expansions for p n h( x ) an d fnh are given that fully 



correspond to those in Theorem 6.1 They are again of order h 2 , under 



similar assumptions. It is also shown that the variances of p n h and f n h tend 
to zero. The main result concerning the asymptotic behavior then follows 
from combining the asymptotics for p n h and f n h- 



Theorem 6.3. Assume that m satisfies the stability condition (24), that m 
and f are twice differentiable and the first of Condition \3.2\ on the kernel 
w. The estimator m n h(%) satisfies (log n) 2 (m n / l (x) — m(x)) = O p (l) if h = 
7/ log n with 7 > 7r. 

Following the proofs in Franke et al. [13] , one can conclude that e.g. the 
variance of p n h is of order 0( cxp ^{ h ^ ), which tends to zero for h = 7/logra, 
with 7 > 7r. For the variance of f n h a similar bound holds. Comparing 



these order bounds to the ones in Theorem 6.1 we see that the latter ones 
are sharper. This is partly due to the fact that Franke et al. [13], don't 
impose conditions on the boundary behavior of the function <j) w (the sec- 



ond of Condition 3.2), whereas their other assumptions are the same as in 
Theorem 16.11 



7 Concluding remarks 

In recent years, many different parametric stochastic volatility models have 
been proposed in the literature. To investigate which of these models are 
best supported by observed asset price data, nonparametric methods can 
be useful. In this paper we reviewed a number of such methods that have 
recently been proposed. The overview shows that ideas from deconvolution 
theory can be instrumental in dealing with this statistical problem and that 
both for high and for low frequency data, methods are now available for 
nonparametric estimation of the (integrated) volatility density at optimal 
convergence rates. 

On a critical note, the methods available so far all assume that the 
volatility process is independent of the Brownian motion driving the asset 
price dynamics. This is a limitation, since in several interesting models non- 
zero correlations are assumed between the Brownian motions driving the 
volatility dynamics and the asset price dynamics. 
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